Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redact sensitive information in catalog queries #24563

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

piotrrzysko
Copy link
Member

@piotrrzysko piotrrzysko commented Dec 23, 2024

Description

This a follow-up to #24562 that introduces redacting of security-sensitive information in statements containing connector properties, specifically:

  • CREATE CATALOG
  • EXPLAIN CREATE CATALOG
  • EXPLAIN ANALYZE CREATE CATALOG

The current approach is as follows:

  • For syntactically valid statements, only properties containing sensitive information are masked.
  • If a valid query references a nonexistent connector, all properties are masked.
  • If a query fails before or during parsing, the entire query is masked

Redacted queries are returned through the REST API, the system.runtime.queries table, and query events (QueryCreatedEvent and QueryCompletedEvent).

Notice that currently this PR includes two commits from #24562.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Section
* Redact sensitive information in statements containing connector properties. ({issue}`23106`)

The SPI will be used by the engine to redact security-sensitive
information in statements that manage catalogs. It has been added at the
connector factory level, rather than the connector level, to allow more
flexibility in retrieving properties. In some cases, we want to perform
redacting before a connector is initiated. For example, when we create a
new catalog by issuing the CREATE CATALOG statement.
Exposed properties fall into one of the following categories: they are
either explicitly marked as security-sensitive or are unknown. The
connector assumes that unknown properties might be misspelled
security-sensitive properties.

The purpose of the included test is to identify security-sensitive
properties that may be used by the connector. It uses the output
generated by the maven-dependency-plugin, configured in the connector's
pom.xml file. This output contains the connector's runtime classpath,
which is then scanned to identify all property names annotated with
@config. Scanning the classpath ensures that all configuration classes
are included, even those used conditionally.
This commit introduces redacting of security-sensitive information in
statements containing connector properties, specifically:

* CREATE CATALOG
* EXPLAIN CREATE CATALOG
* EXPLAIN ANALYZE CREATE CATALOG

The current approach is as follows:

* For syntactically valid statements, only properties containing
sensitive information are masked.
* If a valid query references a nonexistent connector, all properties
are masked.
* If a query fails before or during parsing, the entire query is masked

The redacted form is created in DispatchManager and is propagated to
all places that create QueryInfo and BasicQueryInfo. Before this
change, QueryInfo/BasicQueryInfo stored the raw query text received from
the end user. From now on, the text will be altered for the cases listed
above.
@JsonConstructor for TrimmedBasicQueryInfo was introduced to facilitate
the deserialization of server responses in tests.
@piotrrzysko piotrrzysko force-pushed the redact-sensitive-queries branch from 98470bb to ed595a1 Compare December 23, 2024 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Redact properties from CREATE CATALOG in query info, so they are not present in any outputs
1 participant